Distributed model execution

ABSTRACT

Distributed model execution, including: identifying, for each model of a plurality of models, based on one or more execution constraints for the plurality of models, a corresponding node of a plurality of nodes, wherein the plurality of nodes each comprise one or more computing devices or one or more virtual machines; deploying each model of the plurality of models to the identified corresponding node of the plurality of nodes; and wherein the plurality of models are configured to generate, based on data input to at least one model of the plurality of models, a prediction associated with the data.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a non-provisional application for patent entitled to a filing date and claiming the benefit of earlier-filed U.S. Provisional Patent Application Ser. No. 62/976,965, filed Feb. 14, 2020.

This application is related to co-pending U.S. patent application docket Ser. No. SC0010US01, filed Feb. 16, 2021, and co-pending U.S. patent application docket Ser. No. SC0011US01, filed Feb. 16, 2021, each of which is incorporated by reference in their entirety.

BACKGROUND

Machine learning models may be used to perform various data analysis applications. A client or consumer may not have the hardware or software resources available on-premises to perform computationally intensive predictions or handle large amounts of data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example system for distributed model execution according to some embodiments.

FIG. 2 is a diagram of model dependencies for distributed model execution according to some embodiments.

FIG. 3 is a block diagram of an example execution environment for distributed model execution according to some embodiments.

FIG. 4 is a flowchart of another example method for distributed model execution according to some embodiments.

FIG. 5 is a flowchart of another example method for distributed model execution according to some embodiments.

FIG. 6 is a flowchart of another example method for distributed model execution according to some embodiments.

FIG. 7 is a flowchart of another example method for distributed model execution according to some embodiments.

DETAILED DESCRIPTION

Machine learning models may be used to perform various data analysis applications. For example, one or more machine learning models may be used to generate predictions or other analysis based on input data. Machine learning models may be logically integrated such that the output of some models are provided as input to other models, ultimately resulting in a model providing an output as the prediction.

A client or consumer may not have the hardware or software resources available on-premises to perform computationally intensive predictions or handle large amounts of data. To address these shortcomings, a client may provide the machine learning models used to generate a prediction to off-site or remote resources, such as remote data centers, cloud computing environments, and the like. These remote resources may have access to hardware such as Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), or other devices that the models may leverage to accelerate their performance. The resulting prediction or output may then be provided back to a client.

As will be described in more detail below, the execution of a given model may be performed by a node. Such a node may include a computing device, a virtual machine, or other device as can be appreciated. The models may be deployed for execution to a given node based on various criteria, including the hardware and software resources available to a node, the type of data or calculations used by the model, authorization requirements, model dependencies, and the like. Once deployed, the models may be used for distributed processing of data in order to generate a prediction for a client.

FIG. 1 is a block diagram of a non-limiting example system for distributed model execution. The example system includes a model execution environment 106. The model execution environment 106 includes a plurality of nodes 108 a-n. Each node 108 a-n is an allocation of hardware and software resources, including storage resources (e.g., storage devices, memory, and the like), processing resources (e.g., processors, hardware accelerators such as GPUs, FPGAs, and the like), software resources (e.g., operating systems, software applications, and the like), and other resources as can be appreciated to facilitate distributed model execution. Each node 108 a-n may include one or more computing devices, one or more virtual machines, or other allocations of resources as can be appreciated. Each node 108 a-n may be communicatively coupled to another node 108 a-n using various communications resources, including buses, wired or wireless networks, and the like.

The system of FIG. 1 also includes a management node 102. The management node 102 is similar to the nodes 108 a-n in that the management node 102 may include a computing device, virtual machine, and the like. The management node 102 is communicatively coupled to the model execution environment 106. Although the management node 102 is shown as separate from the model execution environment 106, it is understood that the management node 102 may be located remote from or proximate to the model execution environment 106. For example, the management node 102 and the model execution environment 106 may be implemented in the same or separate data centers, cloud computing environments, and the like.

Also included in the system of FIG. 1 is a client device 112. The client device 112 provides, to the model execution environment 106, a plurality of models 110 a-n for execution in the plurality of nodes 108 a-n. Although FIG. 1 shows each model 110 a-n allocated to and executed in a respective node 108 a-n, it is understood that other configurations and allocations of nodes 108 a-n are possible. For example, a node 108 a-n may be allocated execution of multiple models 110 a-n. As another example, multiple nodes 108 a-n may operate in parallel to facilitate the execution of a single model 110 a-n. In some examples, executing a model at multiple nodes includes assigning different portions of an input data set to the different nodes, where each node executes an entirety of model operations with respect to their respective assigned input data. As yet another example, executing a model at multiple nodes includes executing a first portion of a model (e.g., operations corresponding to one or more first neural network layers) at a first node and second portion of the model (e.g., operations corresponding to one or more second neural network layers) at a second node, where “intermediate” output from the first node is provided as input to the second node.

The plurality of models 110 a-n may include machine learning models (e.g., trained machine learning models such as neural networks), algorithmic models, and the like each configured to provide some output based on some input data. In aggregate, the plurality of models 110 a-n are configured to generate a prediction based on input to one or more of the models 110 a-n. Such predictions may include, for example, classifications for a classification problem, a numerical value for a regression problem, and the like. The plurality of models 110 a-n may also output one or more confidence values associated with the prediction. Accordingly, each model 110 a-n is configured to receive, as input, data output by another model 110 a-n, provide output as input data to another model 110 a-n, or both.

Consider the example graph representations of model dependencies shown in FIG. 2. FIG. 2 shows an exemplary arrangement of models and their respective dependencies. One skilled in the art will appreciate that other arrangements or configurations of model dependencies are possible, and that FIG. 2 merely serves as an illustrative example. As shown in FIG. 2, a model 204 a receives, as input data, input 202 a. Input 202 a may include stored data, data from a data stream, or data from another data source as can be appreciated. Model 204 a provides output to models 204 b and 204 c. Model 204 b receives input from models 204 a and 204 b. Model 204 d receives input from model 204 c and input 202 b. Model 204 d provides, as output data, output 206. In the example of FIG. 2, inputs 202 a, b are provided from one or more data sources to models 204 a and 204 d, respectively. Data processing is performed through the various model dependencies in order to ultimately generate output 206. The output 206 may include a prediction based on the inputs 202 a, b.

Turning back to FIG. 1, the management node 102 executes a management module 104 for distributed model execution. The management module 104 identifies, for each model 110 a-n of a plurality of models 110 a-n, based on one or more execution constraints for the plurality of models 110 a-n, a corresponding node 108 a-n of a plurality of nodes 108 a-n. For example, assume that the management module 104 receives a request from the client device 112 to deploy a plurality of models 110 a-n for deployment to the model execution environment 106. The request may include the plurality of models 110 a-n. The request may also include identifiers, network addresses, or other data facilitating access to the models 110 a-n. For example, after uploading the plurality of models 110 a-n to the model execution environment 106 or another storage location, the request may identify the plurality of models 110 a-n for deployment to the model execution environment 106 a-n for execution. Accordingly, the management module 104 identifies each node 108 a-n to which a model 110 a-n will be deployed for execution.

The management module 104 identifies the nodes 108 a-n for each model 110 a-n based on one or more execution constraints. The one or more execution constraints for a given model 110 a-n are requirements to be satisfied by a given node 108 a-n in order to execute the given model 110 a-n. The one or more execution constraints may include required constraints, where a node 108 a-n must satisfy a particular constraint for a given model 110 a-n to be deployed there. The one or more execution constraints may also include preferential constraints, where a node 108 a-n is more preferentially selected for deployment of a given model 110 a-n if the constraint is satisfied.

The one or more execution constraints may include one or more model dependencies. For example, turning back to the example of FIG. 2, the model 204 b is dependent on the output of the model 204 a as the model 204 b accepts the output of the model 204 a as its input. Similarly, the model 204 c is dependent on models 204 a and 204 b. Accordingly, a node 108 a-n selected for deploying the model 204 b must have a communications pathway to (or be the same node as) nodes 108 a-n to which the models 204 a and 204 c are deployed. Moreover, in some embodiments, the nodes 108 a-n are selected to reduce or minimize latency between nodes 108 a-n having interdependent models 110 a-n.

The one or more execution constraints may also include one or more encryption constraints. The one or more encryption constraints may indicate data input to or received from a given model 110 a-n must be encrypted if transferred over a network. The one or more encryption constraints may also indicate that data input to or received from a given model 110 a-n must be encrypted regardless if transferred over a network (e.g., if the source and destination models 110 a-n are executed in a same node 108 a-n, or executed within different virtual machine nodes 108 a-n implemented in a same hardware environment). The one or more encryption constraints may indicate a type of encryption to be used (e.g., symmetric vs. asymmetric, particular algorithms, and the like). Accordingly, a node 108 a-n may be selected based on an encryption constraint by selecting a node 108 a-n having hardware accelerators, processors, or other resources to facilitate satisfaction of the particular encryption constraints. For example, a model 110 a-n whose output must be encrypted may be preferentially deployed to a node 108 a-n having greater hardware or processing resources, while a model 110 a-n that needs to neither encrypt output or decrypt input may be preferentially deployed to a node 108 a-n having lesser hardware or processing resources. As a further example, a model 110 a-n whose input must be decrypted and whose output must be encrypted may be preferentially deployed to a node 108 a-n having even greater hardware or processing resources.

The one or more execution constraints may also include one or more authorization constraints. An authorization constraint is a restriction on which entities have access to data input to a model 110 a-n, output by a model 110 a-n, generated by the model 110 a-n (e.g., intermediary data or calculations), and the like. For example, an authorization constraint may indicate that a model 110 a-n should be executed on a private node 108 a-n (e.g., a node 108 a-n not shared by or accessible to another tenant or client of the model execution environment 106). As a further example, an authorization constraint may define access privileges for those users or other entities that may access the node 108 a-n executing a given model 110 a-n. As another example, an authorization constraint may indicate that the input to or output from a given model 110 a-n should be transferred only over a private network. Accordingly, the node 108 a-n for the given model 110 a-n should be selected as having access to a private network connection to nodes 108 a-n executing its dependent models 110 a-n.

The management module 104 may also identify the nodes 108 a-n for each model 110 a-n based on one or more node characteristics. Node characteristics for a given node 108 a-n may include hardware resources for the node 108 a-n. Such hardware resources may include storage devices, memory (e.g., random access memory (RAM)), processors, hardware accelerators, network interfaces, and the like. Software resources may include particular operating systems, software libraries, applications, and the like. For example, a model 110 a-n processing highly-dimensional data or large amounts of data at a time may be preferentially deployed to a node 108 a-n having more RAM than another node 108 a-n. As another example, a model 110 a-n that uses a particular encryption algorithm for encrypting output data or decrypting input data may be preferentially deployed to a node 108 a-n having the requisite libraries for performing the algorithm installed.

The management module 104 may also identify the nodes 108 a-n for each model 110 a-n based on one or more model characteristics. The model characteristics for a given model 110 a-n describe the data acted upon and the calculations performed by the model 110 a-n. For example, model characteristics may include a data type for data input to the model 110 a-n. A data type for input data may describe a type of value included in the input data (e.g., integer, floating point, bytes, and the like). A data type for input data may also describe a data structure or class of the input data (e.g., single values, multidimensional data structures, labeled or unlabeled data, time series data, and the like). Model characteristics may also include types of calculations or transformations performed by the model (e.g., arithmetic calculations, floating point operations, matrix operations, Boolean operations, and the like). For example, models 110 a-n performing complex matrix operations on multidimensional floating point data may be preferentially deployed to nodes 108 a-n with GPUs, FPGAs, or other hardware accelerators to facilitate execution of such operations. As another example, a neural network model may be deployed to different node(s) based on architectural parameters, such as whether the neural network is a feed-forward network or a recurrent network, whether the neural network exhibits “memory” (e.g., via long short-term memory (LSTM) architecture), etc.

Identifying, for each model 110 a-n of a plurality of models 110 a-n, based on one or more execution constraints for the plurality of models 110 a-n, a corresponding node 108 a-n of a plurality of nodes 108 a-n may include calculating, for each model 110 a-n of the plurality of models 110 a-n, a plurality of fitness scores for each of the plurality of nodes 108 a-n. In other words, a given model 110 a-n has a fitness score calculated for each of the nodes 108 a-n indicating a fitness of that node 108 a-n for the given model 110 a-n. Each fitness score for a given model may be calculated based on a degree to which the node 108 a-n satisfies the execution constraints for the model 110 a-n.

A node 108 a-n may receive a higher fitness score for satisfying an execution constraint to a greater degree than another node 108 a-n. For example, assume that a first model 110 a-n is dependent on a second model 110 a-n (e.g., for input by virtue of receiving input from the second model 110 a-n, or output by virtue of providing output to the second model 110 a-n), and that the first model 110 a-n is selected for deployment to a first node 108 a-n. Further assume that a second node 108 a-n and a third node 108 a-n are both communicatively coupled to the first node 108 a-n, with the second node 108 a-n having a lower latency connection to the first node 108 a-n compared to a connection from the third node 108 a-n to the first node 108 a-n. Accordingly, the second model 110 a-n would have a higher fitness score for the second node 108 a-n than the third node 108 a-n by virtue of the lower latency connection to the first node 108 a-n to which the dependent first model 110 a-n is to be deployed.

A node 108 a-n may receive a null or zero fitness score for failing to satisfy a required execution constraint. For example, assume that a given model 110 a-n must be executed on a private node 108 a-n. Any nodes 108 a-n accessible to other tenants may receive a zero fitness score for failing to meet the privacy requirement.

The fitness score may also be calculated based on the node characteristics of each node 108 a-n or the model characteristics of the model 110 a-n. For example, a model 110 a-n performing calculations on highly-dimensional data may assign a higher fitness score to nodes 108 a-n with greater RAM. As another example, nodes 108 a-n having advanced processors or hardware accelerators may not receive higher fitness scores for models 110 a-n acting on low-dimensional data or performing more simple arithmetic calculations as such hardware resources would not provide a meaningful benefit when compared to other models 110 a-n.

The management module 104 may then select, for each model 110 a-n, based on the plurality of fitness scores, the corresponding node 108 a-n (e.g., the node 108 a-n to which a given model 110 a-n will be deployed). In some embodiments, selecting, based on the plurality of fitness scores, the corresponding node 108 a-n includes selecting, for each model 110 a-n a highest scoring node 108 a-n. For example, a node 108 a-n may be selected for each model 110 a-n by traversing a listing or ordering of models 110 a-n and selecting a node 108 a-n for a currently selected model 110 a-n. In some embodiments, after a model 110 a-n is assigned to a given node 108 a-n, the fitness scores for the given node 108 a-n may be recalculated for each model 110 a-n not having an assigned node 108 a-n. Accordingly, in some embodiments, a node 108 a-n already having an assigned model 110 a-n may still be an optimal selection for deploying another model 110 a-n. In other embodiments, selecting, based on the plurality of fitness scores, the corresponding node 108 a-n includes generating multiple combinations or permutations of assigning models 110 a-n for deployment to nodes 108 a-n and calculating a best fit assignment for all of the plurality of models 110 a-n (e.g., an assignment with a highest total fitness score across all models 110 a-n).

The management module 104 then deploys each model 110 a-n of the plurality of models 110 a-n to the identified corresponding node 108 a-n of the plurality of nodes 108 a-n. Deploying each model 110 a-n may include sending one or more of the models 110 a-n to their respective assigned node 108 a-n. Deploying each model 110 a-n may also include causing a node 108 a-n to acquire or load its assigned model 110 a-n. For example, the management module 104 may issue a command for a given node 108 a-n to load its assigned model 110 a-n from a local or remote storage location.

Deploying each model 110 a-n may also include configuring one or more models 110 a-n to receive input from one or more data sources (e.g., data sources other than another model 110 a-n). For example, the management module 104 may provide a node 108 a-n of a given model 110 a-n network addresses (e.g., Uniform Resource Locators (URLs), Internet Protocol (IP) addresses) or other identifiers for data sources of input data to the given model 110 a-n. For example, the management module 104 may provide a node 108 a-n a URL or IP address for a data stream of data to be provided as input to the given model 110 a-n. As another example, the management module 104 may provide a node a URL, IP address, memory address, or file path to stored data to be provided as input to the given model 110 a-n. The management module 104 may also provide authentication credentials, login credentials, or other data facilitating access to stored data or data streams.

Deploying each model 110 a-n may also include configuring one or more models 110 a-n to provide, as output, a prediction generated by the plurality of models 110 a-n. For example, the management module 104 may indicate a storage location or file path for output data. The management module 104 may further provide an indication of the storage location of the output data to the client device 112.

In some embodiments, deploying each model 110 a-n includes configuring each node 108 a-n to communicate with at least one other node 108 a-n of the plurality of nodes 108 a-n. Thus, each interdependent model 110 a-n may communicate with each other via the configured nodes 108 a-n. For example, the management module 104 may facilitate the exchange of encryption keys between nodes 108 a-n executing dependent nodes 108 a-n requiring encryption. As another example, the management module 104 may provide, to nodes 108 a-n executing models 110 a-n having dependent models 110 a-n, the URLs, IP addresses, or other identifiers of the nodes 108 a-n executing their respective dependent models 110 a-n. In some embodiments, the management module 104 may allocate or generate communications pathways between nodes 108 a-n via network communications fabrics of the model execution environment 106. In some embodiments, the management module 104 may configure Application Program Interface (API) calls or queries facilitating communications between any of the nodes 108 a-n.

A prediction may then be generated by the deployed models 110 a-n. For example, input data may be provided to one or more of the models 110 a-n. A prediction may then be generated as an output of a model 110 a-n by virtue of the distributed and interdependent execution of the models 110 a-n in the nodes 108 a-n. Data indicating the prediction may then be provided or made accessible to the client device 112.

By deploying the models 110 a-n to the nodes 108 a-n of the model execution environment 106 as described above, the management module 104 ensures a useable configuration of models 110 a-n as deployed to nodes 108 a-n. Moreover, the management module 104 ensures that the model 110 a-n deployment preserves the hierarchy of dependencies of models 110 a-n, as well as the encryption and authorization requirements of the models 110 a-n.

For further explanation, FIG. 3 sets forth a diagram of an execution environment 300 in accordance with some embodiments of the present disclosure. The execution environment 300 depicted in FIG. 3 may be embodied in a variety of different ways. The execution environment 300 may be provided, for example, by one or more cloud computing providers such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud, and others, including combinations thereof. Alternatively, the execution environment 300 may be embodied as a collection of devices (e.g., servers, storage devices, networking devices) and software resources that are included in a private data center. In fact, the execution environment 300 may be embodied as a combination of cloud resources and private resources that collectively form a hybrid cloud computing environment.

The execution environment 300 depicted in FIG. 3 may include storage resources 302, which may be embodied in many forms. For example, the storage resources 302 may include flash memory, hard disk drives, nano-RAM, non-volatile memory (NVM), 3D crosspoint non-volatile memory, magnetic random access memory (MRAM), non-volatile phase-change memory (PCM), storage class memory (SCM), or many others, including combinations of the storage technologies described above. Readers will appreciate that other forms of computer memories and storage devices may be utilized as part of the execution environment 300, including DRAM, static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), universal memory, and many others. The storage resources 302 may also be embodied, in embodiments where the execution environment 300 includes resources offered by a cloud provider, as cloud storage resources such as Amazon Elastic Block Storage (EBS) block storage, Amazon S3 object storage, Amazon Elastic File System (EFS) file storage, Azure Blob Storage, and many others. The example execution environment 300 depicted in FIG. 3 may implement a variety of storage architectures, such as block storage where data is stored in blocks, and each block essentially acts as an individual hard drive, object storage where data is managed as objects, or file storage in which data is stored in a hierarchical structure. Such data may be saved in files and folders, and presented to both the system storing it and the system retrieving it in the same format.

The execution environment 300 depicted in FIG. 3 also includes communications resources 304 that may be useful in facilitating data communications between components within the execution environment 300, as well as data communications between the execution environment 300 and computing devices that are outside of the execution environment 300. Such communications resources may be embodied, for example, as one or more routers, network switches, communications adapters, and many others, including combinations of such devices. The communications resources 304 may be configured to utilize a variety of different protocols and data communication fabrics to facilitate data communications. For example, the communications resources 304 may utilize Internet Protocol (IP) based technologies, fibre channel (FC) technologies, FC over ethernet (FCoE) technologies, InfiniBand (IB) technologies, NVM Express (NVMe) technologies and NVMe over fabrics (NVMeoF) technologies, and many others. The communications resources 304 may also be embodied, in embodiments where the execution environment 300 includes resources offered by a cloud provider, as networking tools and resources that enable secure connections to the cloud as well as tools and resources (e.g., network interfaces, routing tables, gateways) to configure networking resources in a virtual private cloud. Such communications resources may be useful in facilitating data communications between components within the execution environment 300, as well as data communications between the execution environment 300 and computing devices that are outside of the execution environment 300.

The execution environment 300 depicted in FIG. 3 also includes processing resources 306 that may be useful in useful in executing computer program instructions and performing other computational tasks within the execution environment 300. The processing resources 306 may include one or more application-specific integrated circuits (ASICs) that are customized for some particular purpose, one or more central processing units (CPUs), one or more digital signal processors (DSPs), one or more field-programmable gate arrays (FPGAs), one or more systems on a chip (SoCs), or other form of processing resources 306. The processing resources 306 may also be embodied, in embodiments where the execution environment 300 includes resources offered by a cloud provider, as cloud computing resources such as one or more Amazon Elastic Compute Cloud (EC2) instances, event-driven compute resources such as AWS Lambdas, Azure Virtual Machines, or many others.

The execution environment 300 depicted in FIG. 3 also includes software resources 308 that, when executed by processing resources 306 within the execution environment 300, may perform various tasks. The software resources 308 may include, for example, one or more modules of computer program instructions that when executed by processing resources 306 within the execution environment 300 are useful for distributed model execution. The software resources may include one or more models 310 (e.g., models 110 a-n as executed in nodes 108 a-n of FIG. 1). The software resources may also include a management module 312 (e.g., a management module 104 as described in FIG. 1). Accordingly, the execution environment 300 may include one or more of a management node 102 or a management execution environment 106 as described in FIG. 1.

For further explanation, FIG. 4 sets forth a flow chart illustrating an example method for distributed model execution that includes identifying 402, for each model 110 a-n of a plurality of models 110 a-n, based on one or more execution constraints for the plurality of models 110 a-n, a corresponding node 108 a-n of a plurality of nodes 108 a-n. For example, assume that the management module 104 receives a request from the client device 112 to deploy a plurality of models 110 a-n for deployment to the model execution environment 106. The request may include the plurality of models 110 a-n. The request may also include identifiers, network addresses, or other data facilitating access to the models 110 a-n. For example, after uploading the plurality of models 110 a-n to the model execution environment 106 or another storage location, the request may identify the plurality of models 110 a-n for deployment to the model execution environment 106 a-n for execution. Accordingly, the management module 104 identifies each node 108 a-n to which a model 110 a-n will be deployed for execution.

The nodes 108 a-n for each model 110 a-n are identified based on one or more execution constraints. The one or more execution constraints for a given model 110 a-n are requirements to be satisfied by a given node 108 a-n in order to execute the given model 110 a-n. The one or more execution constraints may include required constraints, where a node 108 a-n must satisfy a particular constraint for a given model 110 a-n to be deployed there. The one or more execution constraints may also include preferential constraints, where a node 108 a-n is more preferentially selected for deployment of a given model 110 a-n if the constraint is satisfied.

The one or more execution constraints may include one or more model dependencies. For example, turning back to the example of FIG. 2, the model 204 b is dependent on the output of the model 204 a as the model 204 b accepts the output of the model 204 a as its input. Similarly, the model 204 c is dependent on models 204 a and 204 b. Accordingly, a node 108 a-n selected for deploying the model 204 b must have a communications pathway to nodes 108 a-n to which the models 204 a and 204 c are deployed. Moreover, in some embodiments, the nodes 108 a-n are selected to reduce or minimize latency between nodes 108 a-n having interdependent models 110 a-n.

The one or more execution constraints may also include one or more encryption constraints. The one or more encryption constraints may indicate data input to or received from a given model 110 a-n must be encrypted if transferred over a network. The one or more encryption constraints may also indicate that data input to or received from a given model 110 a-n must be encrypted regardless if transferred over a network (e.g., if the source and destination models 110 a-n are executed in a same node 108 a-n, or executed within different virtual machine nodes 108 a-n implemented in a same hardware environment). The one or more encryption constraints may indicate a type of encryption to be used (e.g., symmetric vs. asymmetric, particular algorithms, and the like). Accordingly. A node 108 a-n may be selected based on an encryption constraint by selecting a node 108 a-n having hardware accelerators, processors, or other resources to facilitate satisfaction of the particular encryption constraints. For example, a model 110 a-n whose output must be encrypted may be preferentially deployed to a node 108 a-n having greater hardware or processing resources, while a model 110 a-n who needs to neither encrypt output or decrypt input may be preferentially deployed to a node 108 a-n having lesser hardware or processing resources. As a further example, a model 110 a-n whose input must be decrypted and whose output must be encrypted may be preferentially deployed to a node 108 a-n having even greater hardware or processing resources.

The one or more execution constraints may also include one or more authorization constraints. An authorization constraint is a restriction on which entities have access to data input to a model 110 a-n, output by a model 110 a-n, generated by the model 110 a-n (e.g., intermediary data or calculations), and the like. For example, an authorization constraint may indicate that a model 110 a-n should be executed on a private node 108 a-n (e.g., a node 108 a-n not shared by or accessible to another tenant or client of the model execution environment 106). As a further example, an authorization constraint may define access privileges for those users or other entities that may access the node 108 a-n executing a given model 110 a-n. As another example, an authorization constrain may indicate that the input to or output from a given model 110 a-n should be transferred only over a private network. Accordingly, the node 108 a-n for the given model 110 a-n should be selected as having access to a private network connection to nodes 108 a-n executing its dependent models 110 a-n.

The management module 104 may also identify the nodes 108 a-n for each model 110 a-n based on one or more node characteristics. Node characteristics for a given node 108 a-n may include hardware resources for the node 108 a-n. Such hardware resources may include storage devices, memory (e.g., random access memory (RAM)), processors, hardware accelerators, network interfaces, and the like. Software resources may include particular operating systems, software libraries, applications, and the like. For example, a model 110 a-n processing highly-dimensional data or large amounts of data at a time may be preferentially deployed to a node 108 a-n having more RAM than another node 108 a-n. As another example, a model 110 a-n that uses a particular encryption algorithm for encrypting output data or decrypting input data may be preferentially deployed to a node 108 a-n having the requisite libraries for performing the algorithm installed.

The management module 104 may also identify the nodes 108 a-n for each model 110 a-n based on one or more model characteristics. The model characteristics for a given model 110 a-n describe the data acted upon and the calculations performed by the model 110 a-n. For example, model characteristics may include a data type for data input to the model 110 a-n. A data type for input data may describe a type of value included in the input data (e.g., integer, floating point, bytes, and the like). A data type for input data may also describe a data structure or class of the input data (e.g., single values, multidimensional data structures, and the like). Model characteristics may also include types of calculations or transformations performed by the model (e.g., arithmetic calculations, floating point operations, matrix operations, Boolean operations, and the like). For example, models 110 a-n performing complex matrix operations on multidimensional floating point data may be preferentially deployed to nodes 108 a-n with GPUs, FPGAs, or other hardware accelerators to facilitate execution of such operations.

The method of FIG. 4 also includes deploying 404 each model 110 a-n of the plurality of models 110 a-n to the identified corresponding node 108 a-n of the plurality of nodes 108 a-n. Deploying 404 each model 110 a-n may include sending one or more of the models 110 a-n to their respective assigned node 108 a-n. Deploying 404 each model 110 a-n may also include causing a node 108 a-n to acquire or load its assigned model 110 a-n. For example, the management module 104 may issue a command for a given node 108 a-n to load its assigned model 110 a-n from a local or remote storage location.

Deploying 404 each model 110 a-n may also include configuring one or more models 110 a-n to receive input from one or more data sources (e.g., data sources other than another model 110 a-n). For example, the management module 104 may provide a node 108 a-n of a given model 110 a-n network addresses (e.g., Uniform Resource Locators (URLs), Internet Protocol (IP) addresses) or other identifiers for data sources of input data to the given model 110 a-n. For example, the management module 104 may provide a node 108 a-n a URL or IP address for a data stream of data to be provided as input to the given model 110 a-n. As another example, the management module 104 may provide a node a URL, IP address, memory address, or file path to stored data to be provided as input to the given model 110 a-n. The management module 104 may also provide authentication credentials, login credentials, or other data facilitating access to stored data or data streams.

Deploying 404 each model 110 a-n may also include configuring one or more models 110 a-n to provide, as output, a prediction generated by the plurality of models 110 a-n. For example, the management module 104 may indicate a storage location or file path for output data. The management module 104 may further provide an indication of the storage location of the output data to the client device 112.

One skilled in the art will appreciate that the approaches set forth above with respect to FIG. 4 may be performed repeatedly such that models 110 a-n are redistributed or redeployed according to various circumstances. Such circumstances may include, for example, a user request, a predefined interval passing, an addition or removal of a node 108 a-n or model 110 a-n, a change in available computational resources in nodes 108 a-n, and the like.

For further explanation, FIG. 5 sets forth a flow chart illustrating another example method for distributed model execution according to embodiments of the present disclosure. The method of FIG. 5 is similar to that of FIG. 4 in that the method of FIG. 5 also includes identifying 402, for each model 110 a-n of a plurality of models 110 a-n, based on one or more execution constraints for the plurality of models 110 a-n, a corresponding node 108 a-n of a plurality of nodes 108 a-n; and deploying 404 each model 110 a-n of the plurality of models 110 a-n to the identified corresponding node 108 a-n of the plurality of nodes 108 a-n.

The method of 5 differs from FIG. 4 in that identifying 402, for each model 110 a-n of a plurality of models 110 a-n, based on one or more execution constraints for the plurality of models 110 a-n, a corresponding node 108 a-n of a plurality of nodes 108 a-n includes calculating 502, for each model 110 a-n of the plurality of models 110 a-n, a plurality of fitness scores for each of the plurality of nodes 108 a-n. In other words, a given model 110 a-n has a fitness score calculated for each of the nodes 108 a-n indicating a fitness of that node 108 a-n for the given model 110 a-n. Each fitness score for a given model may be calculated based on a degree to which the node 108 a-n satisfies the execution constraints for the model 110 a-n.

A node 108 a-n may receive a higher fitness score for satisfying an execution constraint to a greater degree than another node 108 a-n. For example, assume that a first model 110 a-n is dependent on a second model 110 a-n (e.g., for input or output), and that the first model 110 a-n is selected for deployment to a first node 108 a-n. Further assume that a second node 108 a-n and a third node 108 a-n are both communicatively coupled to the first node 108 a-n, with the second node 108 a-n having a lower latency connection to the first node 108 a-n compared to a connection from the third node 108 a-n to the first node 108 a-n. Accordingly, the second model 110 a-n would have a higher fitness score for the second node 108 a-n than the third node 108 a-n by virtue of the lower latency connection to the first node 108 a-n to which the dependent first model 110 a-n is to be deployed.

A node 108 a-n may receive a null or zero fitness score for failing to satisfy a required execution constraint. For example, assume that a given model 110 a-n must be executed on a private node 108 a-n. Any nodes 108 a-n accessible to other tenants may receive a zero fitness score for failing to meet the privacy requirement.

The fitness score may also be calculated based on the node characteristics of each node 108 a-n or the model characteristics of the model 110 a-n. For example, a model 110 a-n performing calculations on highly-dimensional data may assign a higher fitness score to nodes 108 a-n with greater RAM. As another example, nodes 108 a-n having advanced processors or hardware accelerators may not receive higher fitness scores for models 110 a-n acting on low-dimensional data or performing more simple arithmetic calculations as such hardware resources would not provide a meaningful benefit when compared to other models 110 a-n.

Identifying 402, for each model 110 a-n of a plurality of models 110 a-n, based on one or more execution constraints for the plurality of models 110 a-n, a corresponding node 108 a-n of a plurality of nodes 108 a-n also includes selecting 504, for each model 110 a-n, based on the plurality of fitness scores, the corresponding node 108 a-n (e.g., the node 108 a-n to which a given model 110 a-n will be deployed). In some embodiments, selecting, based on the plurality of fitness scores, the corresponding node 108 a-n includes selecting, for each model 110 a-n a highest scoring node 108 a-n. For example, a node 108 a-n may be selected for each model 110 a-n by traversing a listing or ordering of models 110 a-n and selecting a node 108 a-n for a currently selected model 110 a-n. In some embodiments, after a model 110 a-n is assigned to a given node 108 a-n, the fitness scores for the given node 108 a-n may be recalculated for each model 110 a-n not having an assigned node 108 a-n. Accordingly, in some embodiments, a node 108 a-n already having an assigned model 110 a-n may still be an optimal selection for deploying another model 110 a-n. In other embodiments, selecting, based on the plurality of fitness scores, the corresponding node 108 a-n includes generating multiple combinations or permutations of assigning models 110 a-n for deployment to nodes 108 a-n and calculating a best fit assignment for all of the plurality of models 110 a-n (e.g., an assignment with a highest total fitness score across all models 110 a-n).

For further explanation, FIG. 6 sets forth a flow chart illustrating another example method for distributed model execution according to embodiments of the present disclosure. The method of FIG. 6 is similar to that of FIG. 4 in that the method of FIG. 6 also includes identifying 402, for each model 110 a-n of a plurality of models 110 a-n, based on one or more execution constraints for the plurality of models 110 a-n, a corresponding node 108 a-n of a plurality of nodes 108 a-n; and deploying 404 each model 110 a-n of the plurality of models 110 a-n to the identified corresponding node 108 a-n of the plurality of nodes 108 a-n.

The method of 6 differs from FIG. 4 in that deploying 404 each model 110 a-n of the plurality of models 110 a-n to the identified corresponding node 108 a-n of the plurality of nodes 108 a includes configuring 602 each node 108 a-n to communicate with at least one other node 108 a-n of the plurality of nodes 108 a-n. Thus, each interdependent model 110 a-n may communicate with each other via the configured nodes 108 a-n. For example, the management module 104 may facilitate the exchange of encryption keys between nodes 108 a-n executing dependent nodes 108 a-n requiring encryption. As another example, the management module 104 may provide, to nodes 108 a-n executing models 110 a-n having dependent models 110 a-n, the URLs, IP addresses, or other identifiers of the nodes 108 a-n executing their respective dependent models 110 a-n. In some embodiments, the management module 104 may allocate or generate communications pathways between nodes 108 a-n via network communications fabrics of the model execution environment 106. In some embodiments, the management module 104 may configure Application Program Interface (API) calls or queries facilitating communications between any of the nodes 108 a-n.

For further explanation, FIG. 7 sets forth a flow chart illustrating another example method for distributed model execution according to embodiments of the present disclosure. The method of FIG. 7 is similar to that of FIG. 4 in that the method of FIG. 7 also includes identifying 402, for each model 110 a-n of a plurality of models 110 a-n, based on one or more execution constraints for the plurality of models 110 a-n, a corresponding node 108 a-n of a plurality of nodes 108 a-n; and deploying 404 each model 110 a-n of the plurality of models 110 a-n to the identified corresponding node 108 a-n of the plurality of nodes 108 a-n.

The method of 7 differs from FIG. 4 in that the method of FIG. 7 includes generating 702 a prediction based on a distributed execution of the plurality of models 110 a-n. The execution of the plurality of models 110 a-n is considered a distributed execution in that the models 110 a-n are executed across a plurality of distributed nodes 108 a-n. The plurality of models 110 a-n are executed interdependently in that each node 108 a-n provides output to or receives input from at least one other node 108 a-n. The prediction may be generated based on input data provided to one or more of the plurality of models 110 a-n. The prediction may be indicated in, encoded in, or embodied as output from one or more of the plurality of models 110 a-n.

In view of the explanations set forth above, readers will recognize that the benefits of distributed model execution include:

-   -   Improved performance of a computing system by identifying         optimal or best fitting nodes for model deployment and         execution.     -   Improved performance of a computing system by allowing for         remote, distributed execution of models, leveraging mode         advanced hardware and computational resources than found in         client systems.     -   Improved performance of a computing system by deploying models         such that model dependencies, encryption relationships, and         authorization requirements are preserved.

Exemplary embodiments of the present disclosure are described largely in the context of a fully functional computer system for distributed model execution. Readers of skill in the art will recognize, however, that the present disclosure also can be embodied in a computer program product disposed upon computer readable storage media for use with any suitable data processing system. Such computer readable storage media can be any storage medium for machine-readable information, including magnetic media, optical media, or other suitable media. Examples of such media include magnetic disks in hard drives or diskettes, compact disks for optical drives, magnetic tape, and others as will occur to those of skill in the art. Persons skilled in the art will immediately recognize that any computer system having suitable programming means will be capable of executing the steps of the method of the disclosure as embodied in a computer program product. Persons skilled in the art will recognize also that, although some of the exemplary embodiments described in this specification are oriented to software installed and executing on computer hardware, nevertheless, alternative embodiments implemented as firmware or as hardware are well within the scope of the present disclosure.

The present disclosure can be a system, a method, and/or a computer program product. The computer program product can include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium can be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM) or Flash memory, a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network can include copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

Computer readable program instructions for carrying out operations of the present disclosure can be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions can execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer can be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) can execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions can also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein includes an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams can represent a module, segment, or portion of instructions, which includes one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block can occur out of the order noted in the figures. For example, two blocks shown in succession can, in fact, be executed substantially concurrently, or the blocks can sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

It will be understood from the foregoing description that modifications and changes can be made in various embodiments of the present disclosure. The descriptions in this specification are for purposes of illustration only and are not to be construed in a limiting sense. The scope of the present disclosure is limited only by the language of the following claims. 

What is claimed is:
 1. A method for distributed model execution, comprising: identifying, for each model of a plurality of models, based on one or more execution constraints for the plurality of models, a corresponding node of a plurality of nodes, wherein the plurality of nodes each comprise one or more computing devices or one or more virtual machines; deploying each model of the plurality of models to the corresponding node of the plurality of nodes; and wherein the plurality of models are configured to generate, based on data input to at least one model of the plurality of models, a prediction associated with the data.
 2. The method of claim 1, wherein the one or more execution constraints comprise one or more model dependencies, one or more encryption constraints, or one or more authorization constraints.
 3. The method of claim 1, further comprising generating the prediction based on a distributed execution of the plurality of models.
 4. The method of claim 1, wherein identifying the corresponding node of the plurality of nodes is based on one or more node characteristics of the plurality of nodes.
 5. The method of claim 4, wherein the one or more node characteristics comprise one or more of: one or more hardware resources of one or more of the plurality of nodes, or one or more software resources of one or more of the plurality of nodes.
 6. The method of claim 1, wherein identifying the corresponding node of the plurality of nodes is based on one or more model characteristics of the plurality of models.
 7. The method of claim 6, wherein the one or more model characteristics comprise one or more of: a data type for input data to one or more of the plurality of models, or a calculation type performed by one or more of the plurality of models.
 8. The method of claim 1, wherein identifying, for each model of the plurality of models, the corresponding node of the plurality of nodes comprises: calculating, for each model of the plurality of models, a plurality of fitness scores for the plurality of nodes; and selecting, for each model, based on the plurality of fitness scores, the corresponding node.
 9. The method of claim 1, further comprising configuring each node to communicate with at least one other node of the plurality of nodes.
 10. The method of claim 9, wherein configuring each node to communicate with at least one other node of the plurality of nodes comprises configuring each node to provide output to or receive input from at least one other node.
 11. The method of claim 1, further comprising redeploying one or more models of the plurality of models.
 12. An apparatus for distributed model execution, the apparatus configured to perform steps comprising: identifying, for each model of a plurality of models, based on one or more execution constraints for the plurality of models, a corresponding node of a plurality of nodes, wherein the plurality of nodes each comprise one or more computing devices or one or more virtual machines; deploying each model of the plurality of models to the corresponding node of the plurality of nodes; and wherein the plurality of models are configured to generate, based on data input to at least one model of the plurality of models, a prediction associated with the data.
 13. The apparatus of claim 12, wherein the one or more execution constraints comprise one or more model dependencies, one or more encryption constraints, or one or more authorization constraints.
 14. The apparatus of claim 12, wherein the steps further comprise generating the prediction based on a distributed execution of the plurality of models.
 15. The apparatus of claim 12, wherein identifying the corresponding node of the plurality of nodes is based on one or more node characteristics of the plurality of nodes.
 16. The apparatus of claim 15, wherein the one or more node characteristics comprise one or more of: one or more hardware resources of one or more of the plurality of nodes, or one or more software resources of one or more of the plurality of nodes.
 17. The apparatus of claim 12, wherein identifying the corresponding node of the plurality of nodes is based on one or more model characteristics of the plurality of models.
 18. The apparatus of claim 17, wherein the one or more model characteristics comprise one or more of: a data type for input data to one or more of the plurality of models, or a calculation type performed by one or more of the plurality of models.
 19. The apparatus of claim 12, wherein identifying, for each model of the plurality of models, the corresponding node of the plurality of nodes comprises: calculating, for each model of the plurality of models, a plurality of fitness scores for the plurality of nodes; and selecting, for each model, based on the plurality of fitness scores, the corresponding node.
 20. The apparatus of claim 12, wherein the steps further comprise configuring each node to communicate with at least one other node of the plurality of nodes.
 21. The apparatus of claim 20, wherein configuring each node to communicate with at least one other node of the plurality of nodes comprises configuring each node to provide output to or receive input from at least one other node.
 22. The apparatus of claim 12, wherein the steps further comprise redeploying one or more models of the plurality of models. 